58 research outputs found

    Weighted Sparse Partial Least Squares for Joint Sample and Feature Selection

    Full text link
    Sparse Partial Least Squares (sPLS) is a common dimensionality reduction technique for data fusion, which projects data samples from two views by seeking linear combinations with a small number of variables with the maximum variance. However, sPLS extracts the combinations between two data sets with all data samples so that it cannot detect latent subsets of samples. To extend the application of sPLS by identifying a specific subset of samples and remove outliers, we propose an /0\ell_\infty/\ell_0-norm constrained weighted sparse PLS (/0\ell_\infty/\ell_0-wsPLS) method for joint sample and feature selection, where the /0\ell_\infty/\ell_0-norm constrains are used to select a subset of samples. We prove that the /0\ell_\infty/\ell_0-norm constrains have the Kurdyka-\L{ojasiewicz}~property so that a globally convergent algorithm is developed to solve it. Moreover, multi-view data with a same set of samples can be available in various real problems. To this end, we extend the /0\ell_\infty/\ell_0-wsPLS model and propose two multi-view wsPLS models for multi-view data fusion. We develop an efficient iterative algorithm for each multi-view wsPLS model and show its convergence property. As well as numerical and biomedical data experiments demonstrate the efficiency of the proposed methods

    miRBaseConverter: an R/Bioconductor package for converting and retrieving miRNA name, accession, sequence and family information in different versions of miRBase

    Full text link
    Abstract Background miRBase is the primary repository for published miRNA sequence and annotation data, and serves as the “go-to” place for miRNA research. However, the definition and annotation of miRNAs have been changed significantly across different versions of miRBase. The changes cause inconsistency in miRNA related data between different databases and articles published at different times. Several tools have been developed for different purposes of querying and converting the information of miRNAs between different miRBase versions, but none of them individually can provide the comprehensive information about miRNAs in miRBase and users will need to use a number of different tools in their analyses. Results We introduce miRBaseConverter, an R package integrating the latest miRBase version 22 available in Bioconductor to provide a suite of functions for converting and retrieving miRNA name (ID), accession, sequence, species, version and family information in different versions of miRBase. The package is implemented in R and available under the GPL-2 license from the Bioconductor website ( http://bioconductor.org/packages/miRBaseConverter/ ). A Shiny-based GUI suitable for non-R users is also available as a standalone application from the package and also as a web application at http://nugget.unisa.edu.au:3838/miRBaseConverter . miRBaseConverter has a built-in database for querying miRNA information in all species and for both pre-mature and mature miRNAs defined by miRBase. In addition, it is the first tool for batch querying the miRNA family information. The package aims to provide a comprehensive and easy-to-use tool for miRNA research community where researchers often utilize published miRNA data from different sources. Conclusions The Bioconductor package miRBaseConverter and the Shiny-based web application are presented to provide a suite of functions for converting and retrieving miRNA name, accession, sequence, species, version and family information in different versions of miRBase. The package will serve a wide range of applications in miRNA research and could provide a full view of the miRNAs of interest.https://deepblue.lib.umich.edu/bitstream/2027.42/146768/1/12859_2018_Article_2531.pd

    Deep-agriNet: a lightweight attention-based encoder-decoder framework for crop identification using multispectral images

    Get PDF
    The field of computer vision has shown great potential for the identification of crops at large scales based on multispectral images. However, the challenge in designing crop identification networks lies in striking a balance between accuracy and a lightweight framework. Furthermore, there is a lack of accurate recognition methods for non-large-scale crops. In this paper, we propose an improved encoder-decoder framework based on DeepLab v3+ to accurately identify crops with different planting patterns. The network employs ShuffleNet v2 as the backbone to extract features at multiple levels. The decoder module integrates a convolutional block attention mechanism that combines both channel and spatial attention mechanisms to fuse attention features across the channel and spatial dimensions. We establish two datasets, DS1 and DS2, where DS1 is obtained from areas with large-scale crop planting, and DS2 is obtained from areas with scattered crop planting. On DS1, the improved network achieves a mean intersection over union (mIoU) of 0.972, overall accuracy (OA) of 0.981, and recall of 0.980, indicating a significant improvement of 7.0%, 5.0%, and 5.7%, respectively, compared to the original DeepLab v3+. On DS2, the improved network improves the mIoU, OA, and recall by 5.4%, 3.9%, and 4.4%, respectively. Notably, the number of parameters and giga floating-point operations (GFLOPs) required by the proposed Deep-agriNet is significantly smaller than that of DeepLab v3+ and other classic networks. Our findings demonstrate that Deep-agriNet performs better in identifying crops with different planting scales, and can serve as an effective tool for crop identification in various regions and countries

    A novel RUNX2 missense mutation predicted to disrupt DNA binding causes cleidocranial dysplasia in a large Chinese family with hyperplastic nails

    Get PDF
    Background: Cleidocranial dysplasia (CCD) is a dominantly inherited disease characterized by hypoplastic or absent clavicles, large fontanels, dental dysplasia, and delayed skeletal development. The purpose of this study is to investigate the genetic basis of Chinese family with CCD. Methods: Here, a large Chinese family with CCD and hyperplastic nails was recruited. The clinical features displayed a significant intrafamilial variation. We sequenced the coding region of the RUNX2 gene for the mutation and phenotype analysis. Results: The family carries a c. T407C (p.L136P) mutation in the DNA- and CBF beta-binding Runt domain of RUNX2. Based on the crystal structure, we predict this novel missense mutation is likely to disrupt DNA binding by RUNX2, and at least locally affect the Runt domain structure. Conclusion: A novel missense mutation was identified in a large Chinese family with CCD with hyperplastic nails. This report further extends the mutation spectrum and clinical features of CCD. The identification of this mutation will facilitate prenatal diagnosis and preimplantation genetic diagnosis

    A novel single-cell based method for breast cancer prognosis

    Get PDF
    Breast cancer prognosis is challenging due to the heterogeneity of the disease. Various computational methods using bulk RNA-seq data have been proposed for breast cancer prognosis. However, these methods suffer from limited performances or ambiguous biological relevance, as a result of the neglect of intra-tumor heterogeneity. Recently, single cell RNA-sequencing (scRNA-seq) has emerged for studying tumor heterogeneity at cellular levels. In this paper, we propose a novel method, scPrognosis, to improve breast cancer prognosis with scRNA-seq data. scPrognosis uses the scRNA-seq data of the biological process Epithelial-to-Mesenchymal Transition (EMT). It firstly infers the EMT pseudotime and a dynamic gene co-expression network, then uses an integrative model to select genes important in EMT based on their expression variation and differentiation in different stages of EMT, and their roles in the dynamic gene co-expression network. To validate and apply the selected signatures to breast cancer prognosis, we use them as the features to build a prediction model with bulk RNA-seq data. The experimental results show that scPrognosis outperforms other benchmark breast cancer prognosis methods that use bulk RNA-seq data. Moreover, the dynamic changes in the expression of the selected signature genes in EMT may provide clues to the link between EMT and clinical outcomes of breast cancer. scPrognosis will also be useful when applied to scRNA-seq datasets of different biological processes other than EMT.Xiaomei Li, Lin Liu, Gregory J. Goodall, Andreas Schreiber, Taosheng Xu, Jiuyong Li, Thuc D. L

    Large expert-curated database for benchmarking document similarity detection in biomedical literature search

    Get PDF
    Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency-Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical research.Peer reviewe

    Reflectance Spectroscopy with Multivariate Methods for Non-Destructive Discrimination of Edible Oil Adulteration

    No full text
    The visible and near-infrared (Vis-NIR) reflectance spectroscopy was utilized for the rapid and nondestructive discrimination of edible oil adulteration. In total, 110 samples of sesame oil and rapeseed oil adulterated with soybean oil in different levels were produced to obtain the reflectance spectra of 350–2500 nm. A set of multivariant methods was applied to identify adulteration types and adulteration rates. In the qualitative analysis of adulteration type, the support vector machine (SVM) method yielded high overall accuracy with multiple spectra pretreatments. In the quantitative analysis of adulteration rate, the random forest (RF) combined with multivariate scattering correction (MSC) achieved the highest identification accuracy of adulteration rate with the full wavelengths of Vis-NIR spectra. The effective wavelengths of the Vis-NIR spectra were screened to improve the robustness of the multivariant methods. The analysis results suggested that the competitive adaptive reweighted sampling (CARS) was helpful for removing the redundant information from the spectral data and improving the prediction accuracy. The PLSR + MSC + CARS model achieved the best prediction performance in the two adulteration cases of sesame oil and rapeseed oil. The coefficient of determination (RPcv2) and the root mean square error (RMSEPcv) of the prediction set were 0.99656 and 0.01832 in sesame oil adulterated with soybean oil, and the RPcv2 and RMSEPcv were 0.99675 and 0.01685 in rapeseed oil adulterated with soybean oil, respectively. The Vis-NIR reflectance spectroscopy with the assistance of multivariant analysis can effectively discriminate the different adulteration rates of edible oils

    Exploration on the origin of enhanced piezoelectric properties in transition-metal ion doped KNN based lead-free ceramics

    No full text
    In this work, we studied effects of Ni2O3 and Co2O3 doping on crystal structures, microstructures, orthorhombic and tetragonal phase transition temperature (To-t), and electrical properties of [Li0.06(Na0.57K0.43)0.94][Ta0.05(Sb0.06Nb0.94)0.95]O3 (LNKTSN) lead-free ceramics. The experimental results showed that the Ni2O3 addition with appropriate amount could shift the To-t downwards to the room temperature, and thus obviously increasing the room-temperature piezoelectric coefficient (d33), dielectric coefficient (εr) and electromechanical coupling coefficient (kp) of the LNKTSN ceramics. These were consistent with previous experimental results obtained in Fe2O3 doped LNKTSN ceramics. On the contrary, Co3+ doping shifted continuously the To-t upward and deteriorated obviously piezoelectric properties of LNKTSN ceramics. Fe, Co and Ni had similar ion radii and were expected to result in the same (donor or acceptor) doping effects on electrical properties of LNKTSN ceramics. The different doping effects between Co3+ (deterioration) and Ni3+ or Fe3+ (improvement) on the electrical properties of LNKTSN ceramics suggested that the coexistence of orthorhombic and tetragonal phases at room temperature due to downward shift of To-t, rather than ion doping (donor or acceptor doping) effects was the main cause for enhanced room-temperature piezoelectric properties. This conclusion can be extended to all KNN-based materials in general, thus offering principle guide for future development of new lead-free materials with good piezoelectric properties
    corecore